R is a software environment for statistical computing and graphics. Using R you can do rigorous statistical analysis, clean and manipulate data, and create publication-quality graphics.
clustering map
Packages are programs that you import into R to help make tasks easier. The most popular R packages for working with data include dplyr, stringr, tidyr, and ggplot2.
There’s no easy way (yet) for new R users to find R packages that they might need. People are working on this problem. In the meantime, consult the following list or ask a Librarian!
Resources include:
You can create graphs in R without installing a package, but packages will allow you to create better visualizations that are any of the following:
ggplot2 is the most popular visualization package for R. It’s the best all-purpose package for creating many types of 2-dimensional visualizations.
Source:
data(citytemp)
hc <- highchart() %>%
hc_xAxis(categories = citytemp$month) %>%
hc_add_series(name = "Tokyo", data = citytemp$tokyo) %>%
hc_add_series(name = "London", data = citytemp$london) %>%
hc_add_series(name = "Other city",
data = (citytemp$tokyo + citytemp$london)/2)
hc
m <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=-78.6697, lat=35.7876,
popup="You are here")
m # Print the map
p <- plot_ly(economics, x = ~date, y = ~unemploy / pop)
p
ggplot2 was created on the principles of the Layered Grammar of Graphics (2010), by Hadley Wickham and based of off work from Wilkinson, Anand, & Grossman (2005) and Jaques Bertin (1983).
Essentially: graphs are like sentences you can construct, and they have a grammar. The grammar of graphics consists of the following:
at least one layer:
scale
coordinate system
facet (optional)
These components make up a graph.
Open RStudio. Download the following file: script.R File > Open File…
Select the script.R file that you just downloaded
Click Open
Let’s see an example of a simple graph created with ggplot. We are going to use the mpg data set about different cars and their properties.
Exercise #1: In your script file, run ?mpg to learn more about this dataset. To run the code, highlight it and then click Run. (shortcut keys: Mac: command + Enter, Windows: CTRL + Enter)
Exercise #2: Run head(mpg) to see the first few rows of the data.
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p
## 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p
## 4 audi a4 2.0 2008 4 auto(av) f 21 30 p
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p
## # ... with 1 more variables: class <chr>
The graph below uses ggplot2 to look for correlation between a car’s engine displacement and highway mileage.
library(ggplot2): loads the ggplot2 library
ggplot() : function that tells R that you want to make a graph with ggplot
data = mpg : says that you want to use the mpg dataset (sample data that comes with R)
geom_point(): function that says you want to make a scatterplot
mapping = aes(): function that allows you to map data variables to X and Y axes
**Run the following code in your script file:**
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
Make a scatterplot with cyl mapped to the x-axis and hwy mapped to the y-axis.
ggplot(data= mpg) + geom_point(mapping = aes(x=cyl, y=hwy))
Make a scatterplot of disp=x and hwy=y with class mapped to the color aesthetic. Run:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
The type of drive system the car has (4-wheel, rear-wheel, and front-wheel) is mapped to color.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv))
Variables can be mapped to the following aesthetic parameters. If you are publishing in b/w, and can’t use color, you might want to use size or shape:
colorsizeshapealpha - transparencySubstitute another aesthetic in place of color. Run the code:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv))
Facets are a way to create multiple smaller charts, or subplots, based on a variable. Run this code to see what faceting does:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
Substitute class for another variable in the dataset.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)